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1 . '  Introduction 

This  paper  presents  an  aoaptive  curve  fitting  algorithm  based 
upon  a  look-ahead  strategy.  This  algorithm  is  a  modi fication  of  two 
earlier  curve  fitting  packages.  In  the  first  section  a  discussion  of 
the  capabilities  of  the  original  routines  is  given  and  in  the  second 
section  the  new  adaptive  algorithm  is  described.  The  last  section  of 
the  paper  presents  some  numerical  testing  results. 

2.  Original  1 1,  i?  Packages- 

In  [1],  a  FORTRAN  program  is  given  which  computes  a  smooth 
piecewise  polynomial  approximation,  p,  to  a  finite  one-dimensional 
data  set  ((x^,  y^)}™^  using  the  least  squares  (j^)  or  approxi¬ 

mation  operator.  In  order  to  describe  these  algorithms  it  is  convenient 
to  use  a  functional  notation.  Thus,  let  X  =  {x .  T^l=1  and  f  be  the  function 
defined  on  X  by  f(x.)  =  y^,  i  =1,  . m.  Let  a  =  min{x:  x  G  X)  and 
b  =  max{x:  x  G  X}  and, for  any  function  g  defined  on  Y  C  X  we  use  the 
notation  ||g||Y  to  represent  max{|g(x)j:  x  G  Y}. 

A  user  of  the  code  given  in  [1]  must  supply  certain  input  parameters 
beyond  the  data  {(x.. ,  y . )  }r^_-|  •  These  additional  parameters  include  N  and 
SMTH,  where  N  -  1  is  the  desired  degree  of  each  approximating  polynomial 
piece  and  SMTH  is  the  desired  smoothness  or  number  of  continuous  derivatives 
of  the  fit  on  [a,  b] .  The  user  must  also  input  a  value  for  the  desired 
percent  error,  PE,  which  is  used  to  calculate  the  tolerance,  TOL,  which  the 
approximating  pieces  must  meet.  That  is,  each  approximating  polynomial 
piece  must  fit  the  data  in  its  interval  of  definition  with  a  maximum 
absolute  pointwise  error  less  than  TOL.  In  addition,  the  parameter  NPTS 
must  be  supplied.  This  parameter  allows  the  addition  of  artificial  data 
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points.  In  particular,  for  a  sparse  data  set,  the  user  can  add  additional 
data  points  by  setting  NPTS  >  0.  In  this  case,  the  subroutine  LINEAR  will 
add  NPTS  new  points  by  linear  interpolating  between  consecutive  data  points. 

In  this  context,  the  and  Packages  will  calculate  an  approximation 
p  to  f  and  a  set  of  points  (knots)  {t^  }..QC  X  with  a=tg  <  tj  <  ...  <  -j  <  tk=b 
such  that 

a.  p  restricted  to  [ t ^ _ i ,  t^]  is  a  polynomial  p..  G  =  (q:  q  is  a  real 
algebraic  polynomial  of  degree  <_  N  -  1}. 

b.  p  has  SMTH  continuous  derivatives. 

c.  || f  -  p||x  <  TOL. 

The  first  step  in  the  ^  and  packages  is  to  determine  the  location 
of  the  first  knot  t-j .  In  order  to  do  this,  the  codes  find  t-j ,  the  largest 
point  in  X  such  that 

a.  [a,  t-j]  OX  contains  at  least  max(2,  N  +  1)  points,  and 

b.  If  p.|  is  the  best  approximation  to  f  from  Tr^  on  S-j  =  [a,  t-|]  fl  X, 
then  || f  -  p.j  |ls  <_  TOL. 

If  t|  =  b,  the  algorithm  successfully  terminates  with  no  interior  knots. 

If  no  such  t-j  exists,  then  the  algorithm  aborts  and  an  appropriate  error 
message  is  printed. 

If  t.|  exists,  a  <  t-|  <  b,  and  SMTH  _>  1,  then  the  right  endpoint  of 
the  first  polynomial  piece  is  determined  by  "backing  off"  from  t-j  to  a 
suitable  point  in  S^j  using  the  following  procedure. 

The  l  =  N  -  SMTH  -  1  largest  extreme  points  of  the  error  function 
f(x)  -  p-j(x)  are  selected  from  the  interval  S-|  and  one  of  them  will  be 
used  for  the  knot  t-| .  The  knot  is  chosen  in  this  manner  because,  in  the 
continuous  setting,  if  f  is  differentiable  and  £  is  an  interior  relative 
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extreme  point  of  f(x)  -  p-j(x),  then  f’(c)  -  p-| (c )  =  0  so  that  f'(c)  =  p-j(c). 
Thus,  joining  p£  to  p-j  at  c  smoothly  will  force  P2  to  have  the  same  direction 
as  f  at  the  knot  and  should  damp  out  unwanted  oscillations. 

In  order  to  decide  which  extreme  point  to  use  for  t-j »  the  code  first 
computes  the  values  f'Uj),  f'Uj),  -f 1  (?A)  where  f'Uv)  is  an  approxi¬ 
mation  for  the  "slope"  of  the  data  at  determined  by  the  centered  quadratic 
interpolation  of  the  data  at  and  its  two  immediate  neighbors.  The  code 
then  chooses  the  largest  l  such  that  |f’Uv)  -  pjUy)|  £  TOL.  If  no  such 
£v  exists,  then  t-j  is  chosen  to  be  the  largest  at  which  !f,(Sv)-pj(C  )| 
attains  its  minimum.  This  procedure  *s  repeated  on  the  intervals  [tj,  b], 

. ...  [tn.  b]  until  t  =  b. 


3.  The  New  Algorithm 

The  original  ji-|  and  packages  approximated  with  polynomial  pieces 

N  , 

in  standard  form  (eg.,  p.s(x)  =  £  c  x  ).  In  the  new  algorithm,  the 

n=l  n 

N 

polynomial  pieces  are  of  the  form  p^(x)  =  \  cn(x  -  x*)  where  x*  is 

n=l  n 


the  left  endpoint  of  the  subinterval  on  which  p - (x )  is  defined.  However, 
the  main  focus  of  this  algorithm  is  a  new  "backing  off"  strategy.  This 
new  "backing  off"  strategy  can  be  characterized  as  a  look-ahead  strategy. 

In  the  first  step,  the  code  will  attempt  to  fit  the  entire  data  set 
on  S  =  [a,  b]  0  X  with  a  polynomial  p  €  such  that  ||f  -  P  1 1 ^  <.  TOL. 

If  this  is  not  possible,  then  the  algorithm  will  find  the  largest  element 
of  X,  t-j,  such  that  the  best  approximation  p-j  to  f  on  Sj  =  [a,  t:-|]  H  X 
satisfies  |jf  -  P-]||$  £  TOL.  This  is  done  via  a  halving  procedure. 


Specifically,  if  it  is  found  that  the  best  approximation  on  [a,  b]  does 
not  satisfy  the  tolerance  criterion  then  the  code  attempts  to  fit  the 
data  on  the  first  N  +  1  points  of  X.  If  it  is  unable  to  satisfy  the 
tolerance  criterion  on  this  set,  the  code  aborts,  printing  an  appropriate 
error  message.  If  the  error  criterion  is  satisfied  on  this  small  set 
then  the  point  t^  is  computed  via  "bisection"  initialized  with  the  right 
endpoint  of  this  small  set  and  b.  The  code  will  then  attempt  to  fit  the 

'X, 

rest  of  the  data  using  t-|  as  the  current  left  endpoint.  Specifically, 

the  best  approximation  p2  to  f  on  S2  =  [t-j ,  b]  Pi  X  subject  to  pp^t-j) 

=  p|J^(^),  j  =  0,  1,  ...,  NSMTH  is  calculated.  If  ||f  -  p2||  <  TOL 

holds,  then  the  code  successfully  terminates  with  t-j  taken  as  our  knot 

t.| .  If  |[  f  -  P  2 1 1 3  >  "^L  then  bisection  procedure  is  invoked  once 

again  to  find  t2  £  X  such  that  t.,  is  the  largest  element  of  X  for  which 

the  best  approximation  p2  to  f  on  S2  =  [t-j ,  H  X  subject  to  p2^(t-|) 

=  p|J^(t-j),  j  =  0,  1 ,  ...,  NSMTH  satisfies  ||f  -  P2II5  £  TOL.  The  code 

stores  the  current  second  subinterval  [t-j ,  £.,]  and  the  polynomial  piece 

P2  as  the  current  knots  and  the  best  approximation  to  date.  The  code  will 

then  "back  off"  one  point  from  t-j  to  ^  ,  the  point  of  X  immediate  proceding 

t.| .  Then  the  above  procedure  is  applied  to  [a,  t-j].  That  is,  t2  e  X  is 

calculated  such  that  t2  is  the  largest  element  of  X  for  which  the  best 

approximation  p2  to  f  on  S2  =  [t^ ,  t2]  f!  X  subject  to  p2J'^(t^)  =  pjJ  ^(t-|), 

j  =  0,  ....  NSMTH  satisfies  ||f  -  p  1 1 £  £  TOL.  If  t»  s  b  then  the  algorithm 

2  2  . 

successfully  terminates  with  the  two  polynomial  pieces  p-j  and  p,  and  the 
knot  t-j  as  the  desired  fit.  If  t2  _<  t2  occurs  the  p^  and  p2  and  the  knot 

t-j  remains  the  current  candidate  for  the  first  two  pieces  and  common  knot 

for  the  final  fit.  If  t-j  >  ^2  occurs  than  p2,  t^  and  are  replaced  by 


Pj,  and  respectively,  and  the  above  backing  off  procedure  is 
continued.  At  this  point  two  options  are  available.  In  the  first 
option,  denoted  as  COMPUT  I,  the  code  will  continue  backing  off,  one 
point  at  a  time,  and  store  the  current  t-j ,  and  the  approximating 
polynomial  piece,  P£,  that  approximates  farthest  to  the  right  subject 
to  the  smoothness  constraints  and  tolerance  criterion  as  the  current 
knots  and  best  second  approximation  piece  replacing  any  previous 
lesser  attempts.  If  the  code  backs  off  two  consecutive  points  without 
fitting  farther  to  the  right  subject  to  the  smoothness  constraints  and 
given  tolerance,  then  the  code  will  accept  the  current  t, ,  and  p^  as 
the  knots  and  best  approximation  for  the  second  subinterval  and  begin 
work  on  a  third  subinterval  which  has  t,  as  its  left  endpoint  prior  to 
"backing  off".  In  the  second  option,  denoted  as  COMPUT  II,  the  code 
will  back  off  a  fixed  number  of  points  in  an  attempt  to  improve  the 
current  approximation.  This  fixed  number  is  the  greatest  integer  less 
than  1/2  the  number  of  points  in  [a,  t-j]  C\  X,  t-j  =  the  initial  value  for 

tl  * 

4.  Numerical  Results 

In  most  cases,  the  original  and  packages  and  the  modified 
algorithms  presented  here  produced  similar  results.  Precisely,  in  68 
out  of  79  data  sets  tested  with  the  £-|  packages  and  in  64  out  of  83  data 
sets  tested  with  the  Packages  similar  results  were  produced  by  the 
original  and  the  new  algorithms.  Some  examples  illustrating  this  are 
shown  in  Figures  1-12  on  six  given  data  sets. 

In  our  testing  we  found  that  the  original  t-j  and  packages  produced 
better  results  than  the  new  versions  in  a  small  but  significant  number  of 
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cases.  We  believe  that  this  is  primarily  due  to  the  fact  that  the  new 
versions  are  designed  with  the  objective  of  reducing  the  number  of  knots 
by  extending  the  length  of  each  subinterval  as  far  as  possible  subject 
to  the  given  smoothness  and  tolerance  requirements.  In  doing  this,  the 
new  methods  try  to  be  too  perfect  too  soon  and  are  unable  to  recover  later 
on  in  the  approximation.  One  could  say  that  our  methods  suffer  from 
greediness.  An  example  of  this  behavior  is  given  by  the  Titanium  data 
tested  with  the  packages  (Figures  15,  16).  Here  the  new  method  causes 
piecewise  interpolation  to  result.  Note  that  this  instability  of  the  new 
algorithm  is  not  found  in  the  original  algorithm.  On  the  other  hand,  for 
a  few  data  sets,  such  as  the  data  set  Test  50000  the  new  algorithm  for  the 
^2  package  (Figures  13,  14)  reduced  the  number  of  knots  needed.  Thus,  we 
cannot  claim  that  the  original  £-j  and  packages  will  always  produce 
better  fits  than  the  revised  versions.  Likewise,  there  were  five  data  sets 
where  the  original  package  is  superior  to  the  new  £-j  package.  However, 
there  were  also  six  data  sets  where  the  new  package  produced  a  better 
fit  than  the  original  ^  package.  For  example,  the  ABS(SIN(X))  data  set 
with  noise  and  bad  points  was  tested  using  the  t-j  packages  (Figures  17,  18). 
Here  the  original  package  attempted  to  fit  some  of  the  bad  points,  whereas 
the  new  version  did  not. 

In  the  testing  of  these  codes  it  was  observed  that  the  COMPUT  I  and 
COMPUT  II  options  gave  the  same  results  for  all  162  data  sets  tested. 

Our  results  may  not  be  conclusive  for  all  cases  but  we  will  assume  the 
following  statement  is  reasonable.  In  general,  backing  off  until  two 
consecutive  attempts  do  not  improve  the  fit  subject  to  the  smoothness 
constraints  and  given  tolerance  will  suffice. 
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Further  backing  up  will  not  allow  the  current  approximating  piece  to 
extend  farther  to  the  right. 

In  summary,  based  upon  the  limited  testing  that  we  performed,  it 
appears  that  the  original  and  codes  are  preferable  to  the  modified 
codes  presented  here.  We  believe  this  conclusion  is  warranted  due  to  the 
fact  that  the  original  codes  appear  to  be  more  robust  than  the  new  codes 
using  the  look-ahead  strategy  and  that  when  both  codes  performed  satis¬ 
factorily,  they  gave  similar  results. 

As  a  final  note  we  also  wish  to  point  out  that  our  study  also  illus¬ 
trates  one  problem  facing  a  user  of  such  software.  Namely,  at  the  outset 
there  is  no  way  to  predict  the  effectiveness  of  a  given  data  fitting  code. 
A  good  illustration  of  this  is  given  in  the  t-j  and  fits  of  the  Titanium 
data  (Figures  19  and  20).  Here  the  code  was  successful  whereas  the 
*2  code  was  not.  Since  this  particular  data  is  not  particularly  unusual, 
it  is  somewhat  surprising  that  these  two  codes  should  vary  so  widely  on 
it.  We  believe  that  this  is  an  indication  of  the  need  for  more  research 
on  the  many  problems  involved  with  data  fitting. 
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